Constrained K-Means Clustering
نویسندگان
چکیده
We consider practical methods for adding constraints to the K-Means clustering algorithm in order to avoid local solutions with empty clusters or clusters having very few points. We often observe this phenomena when applying K-Means to datasets where the number of dimensions is n 10 and the number of desired clusters is k 20. We propose explicitly adding k constraints to the underlying clustering optimization problem requiring that each cluster have at least a minimum number of points in it. We then investigate the resulting cluster assignment step. Preliminary numerical tests on real datasets indicate the constrained approach is less prone to poor local solutions, producing a better summary of the underlying data.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملClustering Using Boosted Constrained k-Means Algorithm
This article proposes a constrained clustering algorithmwith competitive performance and less computation time to the state-of-the-art methods, which consists of a constrained k-means algorithm enhanced by the boosting principle. Constrained k-means clustering using constraints as background knowledge, although easy to implement and quick, has insufficient performance compared with metric learn...
متن کاملConstrained clustering with k-means
We introduce a k−means type clustering in the presence of cannot–link and must–link constraints. First we apply a BIRCH type methodology to eliminate must–link constraints. Next we introduce a penalty function to substitute cannot–link constraints. When penalty values increase to +∞ the original cannot–link constraints are recovered. The preliminary numerical experiments show that constraints h...
متن کاملk-Means Clustering via the Frank-Wolfe Algorithm
We show that k-means clustering is a matrix factorization problem. Seen from this point of view, k-means clustering can be computed using alternating least squares techniques and we show how the constrained optimization steps involved in this procedure can be solved efficiently using the Frank-Wolfe algorithm.
متن کامل